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Abstract 


A class  of  asynchronous  iterative  methods  is  presentee  .ur  solving  a system  of 
equations.  Existing  iterative  methods  are  identified  In  terms  of  asynchronous  Iterations, 
and  new  schemes  are  introduced  corresponding  to  a parallel  Implementation  on  a 
multiprocessor  system  with  no  synchronization  between  cooperating  processes.  A 
sufficient  condition  is  given  to  guarantee  the  convergence  of  any  aeynchronous 
iterations,  and  results  are  extended  to  include  iterative  methods  with  memory. 

Asynchronous  Iterative  methods  are  then  evaluated  from  a computational  point  of 
view,  and  bounds  are  derived  for  the  efficiency.  The  bounds  are  compared  with  actual 
measurements  obtained  by  running  various  asynchronous  Iterations  on  a multiprocessor, 
and  the  experimental  results  show  clearly  the  advantage  of  purely  asynchronous  Iterative 
methods. 
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1 - Introduction 


In  this  paper  we  investigate  the  fixed  point  problem  for  an  operator  F from  f?rt 
into  itself:  we  want  to  find  a vector  * in  #?rt  which  satisfies  the  system  of  equations 
represented  by 

* - F(x).  (l-D 


In  [1],  Chazan  and  Miranker  introduced  the  chaotic  relaxation  schema,  a class  of 
iterative  methods  for  solving  equation  (1.1)  where  F is  a linear  operator  given  by 
F(x)  = Ax  ♦ 6.  They  showed  that  iterations  defined  by  a chaotic  relaxation  scheme 
converge  to  the  solution  of  equation  (1.1)  if  and  only  if  p(\A\)  < l.  (If  M is  a real 
nxn  matrix,  p(M)  denotes  its  spectral  radius  and  |Af|  denotes  the  non-negative  n*n  matrix 
obtained  by  replacing  the  elements  of  M by  their  absolute  values.) 

In  [4],  Miellou  generalized  the  chaotic  relaxation  scheme  to  include  non-linear 
operators  and  obtained  convergence  results  similar  to  those  of  [1]  in  the  case  of 
contracting  operators  (see,  for  example,  [5,  p.  433]). 

In  both  [1]  and  [4],  the  motivation  of  defining  chaotic  relaxation  is  to  account  tor 
the  parallel  implementation  of  iterative  methods  on  a multiprocessor  system  so  as  to 
reduce  communication  and  synchronization  between  the  cooperating  processes.  This 
reduction  is  obtained  by  not  forcing  the  processes  to  follow  a predetermined  sequence  of 
computations,  but  simply  by  allowing  a process,  when  starting  the  evaluation  of  a new 
iterate,  to  choose  dynamically  not  only  the  components  to  be  evaluated  but  also  the 
values  of  the  previous  iterates  used  in  the  evaluation. 

The  definition  of  the  chaotic  relaxation  scheme  does  not,  however,  allow  for  a 
completely  arbitrary  choice  of  the  antecedent  values  used  in  the  evaluation  of  an  iterate. 
The  main  restriction  is  that  there  must  exist  a fixed  positive  integer  s such  that,  in 
carrying  out  the  evaluation  of  the  i-th  iterate,  a process  cannot  make  use  of  any  value  of 
the  components  of  the  y-th  iterate  if  j < i-s.  For  example,  if,  for  some  reason  (due  to  the 
computation  itself  or  to  the  mutiprocessor  system),  a process  may  take  an  arbitrarily  long 
time  to  relax  the  components  it  is  evaluating,  the  other  processes  may  have  to  wait  until 
the  evaluation  by  the  first  process  is  completed.  This  requires  repeated  checking  before 
each  step  of  the  iteration  and  some  form  of  synchronization.  This  is  exactly  what  we 
want  to  avoid  because  the  use  of  synchronization  primitives  is  time  consuming  and  also 


n □ 


because  synchronization  forces  some  of  the  processors  to  be  Idle  or  Implies  the 
switching  of  context  This  creates  an  unnecessary  overhead,  reduces  the  parallelism,  and 
decreases  the  maximum  speed-up  we  expect  to  achieve  In  using  a multiprocessor. 


i 


, 


In  the  next  section  we  introduce  the  class  of  asynchronous  iterative  methods  which 
does  not  impose  the  restriction  mentioned  above,  and  we  show  that  existing  Iterative 
methods  (and,  in  particular,  the  chaotic  relaxation)  can  be  represented  as  special  cases  of 
asynchronous  Iterations.  Section  3 gives  the  definition  and  reviews  some  properties  of 
contracting  operators.  Then  the  theorem  of  section  4 generalizes  the  results  on  the 
convergence  of  the  chaotic  relaxation  obtained  by  Chazan  and  MiranHer  [1]  and  by 
Miellou  [4J  This  result  is  further  extended,  in  section  5,  to  include  Iterative  methods 
with  memory.  In  section  6,  we  consider  the  complexity  of  asynchronous  Iterative 
methods,  and  we  derive  bounds  on  the  efficiency.  These  bounds  are  then  compared  with 
actual  measurements  of  asynchronous  iterations.  The  experimental  results,  presented  in 
section  7,  show  a considerable  advantage  for  iterations  making  no  use  of  synchronization, 
and  this  constitutes  the  best  argument  for  using  asynchronous  iterative  methods. 
Possible  extensions  of  the  results  are  discussed  in  section  9,  and  concluding  remarks  are 
presented  in  the  last  section. 

2 - The  dess  of  asynchronous  iterative  methods 

The  following  notations  will  be  used  throughout  the  paper.  If  * is  a vector  of  f?rt, 

t 

its  components  will  be  denoted  by  i • /,  ~ , ft.  To  avoid  confusion,  a sequence  of 
vectors  of  ft*  will  be  denoted  by  *(j),  j * 0, 1, ....  If  F is  an  operator  of  ft*  Into  Itself, 
F(x)  will  also  be  represented  in  components  by  fjx)  or  by  /j(*j,  ~ , xn),  (•!,«,»  We 
denote  by  W the  set  of  all  non-negative  integers. 

2.1  - Definition  of  asynchronous  iterative  methods 
Definition  1 1 

Lot  F bo  an  operator  from  ft*  to  ft*.  An  asynchronous  iteration  corresponding 
to  the  operator  F and  starting  with  a given  vector  tt(0)  is  a sequence  *(J),  J • 0,  I,  ~, 
of  vectors  of  ft*  defined  recursively  by: 

, ( *i(H)  W itJj 

Xs(j)  » V / 

l - . *n<tn<J>»  « i£Jj> 

I 
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where  <?  - { Jj  I j » 1,  2,  ... } Is  a sequence  of  non-empty  subsets  of  {f 
x5  « { ... , ) j m j,  2, ... } is  a sequence  of  elements  In  Nn. 


In  addition,  J and  /5  are  subject  to  the  following  conditions: 
for  each  , n 

(a)  tjij)  * j-l,  j • i,  2, ..., 

<b)  considered  as  a function  of  j,  tends  to  infinity  as  j tends  to  Infinity, 
(c)  i occurs  infinitely  many  often  in  the  sets  J u j ■ 1,2,  -.. 


An  asynchronous  iteration  corresponding  to  F,  starting  with  x(0)  and  defined  by 
} and  /6  will  be  denoted  by  (F,x(0)J,/>).  I 


An  asynchronous  iteration  (F,*(0)J,/6)  may  be  thought  of  as  corresponding  to  the 


following  sequence  of  computations  on  an  asynchronous  multiprocessor, 


Assume  we  have  a pool  of  processors  available.  Let  t j,  j • 1,2, ....  be  an 
increasing  sequence  of  time  instants.  At  time  tj  processor  P is  idle  and  is  assigned  to 
the  evaluation  of  the  iterate  x(j),  x(j)  differs  from  x(j-l)  by  the  set  of  components 
l *1  \ i C Jj  } and  P starts  computing  these  components  using  values  of  components 
Known  from  previous  iterates,  namely  the  r-th  component  of  the  «r(/Mh  iterate,  for 
r - 1, , n.  The  choice  of  the  components  may  be  guided  by  any  criterion,  and,  in 
particular,  a natural  criterion  is  to  pick  up  the  most  recently  available  values  of  the 
components.  This  scheme  does  need  any  synchronization  between  the  processes.  At 
some  time  t later  on  (Jit  > j),  P will  finish  its  computations  and  will  be  assigned  to  a new 
evaluation:  x(k). 


The  use  of  asynchronous  iterative  methods  is  obviously  not  restricted  to 
multiprocessor  systems,  and  the  scheme  is  also  well  suited  for  execution  on  a network  of 
computers,  in  particular,  when  the  communication  between  elements  of  the  network  Is  not 
too  expensive  as  opposed  to  the  computation  itself. 


We  notice  that,  in  the  evaluation  of  an  iterate,  nothing  is  imposed  on  the  use  of  the 
values  of  the  previous  iterates.  The  only  thing  required,  by  condition  (b)  of  the 
definition,  is  that,  eventually,  the  values  of  an  early  iterate  cannot  be  used  any  more  in 
further  evaluations  and  more  and  more  recent  values  of  the  components  have  to  be  used 
instead.  On  a multiprocessor,  this  condition  can  be  satisfied  as  long  as  no  processor 
crashes  (and  eventually  completes  its  computation). 


Condition  (a)  of  the  definition  states  the  fact  that  only  components  of  previous 
iterates  can  be  used  in  the  evaluation  of  a new  iterate.  Condition  (c)  guarantees  that  no 
I component  be  abandoned  forever. 

I 

f 

2.2  - Examples  and  particular  cases  of  asynchronous  iterations 

Classical  iterative  methods:  point  or  block  Jacobi,  Gauss-Seidel,  etc.,  as  well  as 
j others  introduced  more  recently:  chaotic  relaxation  schema  [1]»  periodic  chaotic  schema 
! [2],  iteration  chaotiqua  A retards  [4],  iteration  chaotiqua  seria-paraUila  [6],  can  all  be  seen 
' as  particular  cases  of  asynchronous  iterations. 

j 

l For  example,  the  point-Jacobi  method  defined  on  the  operator  F with  the  initial 
approximation  x(0)  can  be  represented  by  the  asynchronous  iteration  (F,x( 0)J,d)  where 
& and  x5  are  defined  by: 

J j * { J»  ••• , n } for  j ■ 1,  2,  •*• , 

i s;(j)  ■ j-1  for  j » 1,  2, ...  and  i • /, ....  n . 

I 

j 

The  same  point-Jacobi  method  can  equivalently  be  represented  by  the 

asynchronous  iteration  where  } and  />  are  defined  by: 

i | 

Jj  m { 1 * (j-i  mod  n)  } for  j « i,  2, ... , 

*L(j)  m n l (j-D/n  J for  j - I,  2, ...  and  i • J, n . 

i 

Although  those  two  representations  correspond  to  the  same  point-Jacobi  method, 

| they  differ  by  the  implicit  information  they  contain  about  the  decomposition  of  the 

i 

i computations.  In  the  first  case,  all  components  are  evaluated  at  once  and  this, 
i presumably,  will  be  done  by  one  computational  process.  In  the  second  case,  however, 
1 each  component  is  evaluated  separately,  and  up  to  n processes  can  be  used  to  perform 
the  evaluations.  Between  the  two  extreme  representations  of  the  point-Jacobi  method,  in 
terms  of  asynchronous  iterations,  several  others  can  be  proposed,  each  of  which  can  be 
interpreted  in  terms  of  decomposition  into  computational  processes  and  In  terms  of 
implementation  by  concurrent  processes. 

The  iterative  method  proposed  by  Robert,  Charnay  and  Musy  (iteration  chaotiqua 
sSria-paraUila  [6])  can  be  obtained  as  a special  case  of  an  asynchronous  Iteration  in 
which  s/J)  - J-1  (for  all  i-  l,~.,n  and  j - 1,  2, ...).  This  corresponds  to  a strictly 
sequential  computation  of  sets  of  components.  The  choice  of  the  components  within  a set 
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is  arbitrary  and  tha  calculations  of  their  values  can  be  done  simultaneously  but  the 
evaluation  of  a new  set  of  components  cannot  be  started  before  all  components  of  the 
previous  set  have  been  computed  and  their  new  values  relaxed.  The  goal  of  their 
research  was  to  show  that,  for  example,  In  the  iterative  solution  of  linear  systems 
resulting  from  the  application  of  the  method  of  finite  differences  to  partial  differential 
equations,  it  is  possible  to  concentrate  the  computations  more  on  those  points  of  the  grid 
where  the  convergence  is  slower  than  on  other  nodes.  This  is  not  the  case  with  ordinary 
iterative  methods  for  which  any  component  is  iterated  as  many  times  as  any  other 
component 

Chazan  and  Miranker  [1]  have  proposed  a chaotic  relaxation  scheme  to  solve  a 
linear  system.  Our  definition  of  an  asynchronous  iterative  method  is  very  similar  to  the 
definition  they  give  for  a chaotic  iterative  scheme.  Our  definition,  however,  does  not 
have  the  restriction  they  impose,  namely  (with  our  notations)  that  j-sfj)  has  to  be 

uniformly  bounded  by  some  fixed  integer,  say  *,  (for  all  i • 1 n and  j - 1,2, ...).  This 

means  that,  in  the  evaluation  of  the  >-th  iterate,  only  values  of  the  components  of  the  s 
preceding  iterates  can  be  used.  From  a practical  point  of  view,  in  an  actual 
implementation  of  such  scheme  on  an  asynchronous  multiprocessor,  this  requires  a strong 
assumption  about  the  relative  speeds  of  the  different  processors,  about  the  scheduling 
: policy  of  the  supervisory  system,  and  about  the  implementation  of  the  computations  in 
general.  There  is  no  way  to  guarantee  this  assumption  without  some  form  of 
i synchronization  (which  is  precisely  what  we  want  to  avoid). 

Although  all  chaotic  relaxation  methods  (as  presented  in  [1]  or  [4])  can  be 
i identified  as  asynchronous  iterations,  the  converse  is  not  true  as  is  illustrated  by  the 
following  example.  Let  F be  an  operator  from  #?*  into  itself.  Assume  we  have  two 
processes  Pj  and  Pp  attached  to  the  evaluations  of  the  first  and  second  components, 

, respectively.  To  avoid  synchronization,  the  processes  always  use  In  an  evaluation  the 
values  of  the  components  currently  available  at  the  begining  of  the  computation.  If  we 
assume  that  it  always  takes  i unit  of  time  for  Pj  to  perform  the  evaluation  of  sj  and  it 
takes  k units  of  time  for  Pj  to  perform  the  Jk-th  evaluation  of  Xp  then  the  quantity 
I J ~ fjfJ)  grows  as  VJ  which  is  unbounded.  This  iteration  is  a legitimate  asynchronous 
iteration,  it  Is  not,  however,  allowed  In  the  setting  of  [1]  and  [4]. 
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3 - Contracting  operators 

In  the  next  section  we  shall  give  e sufficient  condition  on  the  operator  F for  the 
convergence  of  any  asynchronous  iteration.  Needed  definitions  are  given  in  this  section. 

3.1  - Lipchitzian  and  contracting  operators 

Contracting  operators  to  be  defined  below  correspond  to  P-contraetiont  in 
[5,  p.  433],  and  the  notion  was  used  to  obtained  the  results  of  [4]  and  [6]. 

Definition  2: 

An  operator  F from  Fn  to  IRn  is  a Lipchitzian  operator  on  a subset  0 of  f?n  if 
there  exists  a non-negative  nxn  matrix  A such  that: 

\F(x>-F(y)\  * A\x-y\  , V *,  y 6 D , (3-D 

where,  if  z is  a vector  of  #?n  with  components  z^  i - l, ... , n,  |z|  denotes  the  vector 
with  components  1*^1,  i - 1, ... , n,  and  the  inequality  holds  for  every  component 

The  matrix  A will  be  called  a Lipchitzian  matrix  lor  the  operator  F.  I 

From  this  definition  we  can  see  that  any  Lipchitzian  operator  is  continuous  and,  in 
fact,  uniformly  continuous  on  D.  However  this  definition  is  too  broad  and,  in  particular, 
we  are  not  guaranteed  of  the  existence  and  the  uniqueness  of  a fixed  point  as  is  shown 
by  the  following  example.  Take  the  operator  F from  I?  to  I?  defined  by  F(x)  » \/ x^+a?, 
this  operator  is  Lipchitzian  on  #?  because 

\F(x)-F(y)\  m \(x-y ]{(x*y)/ (V  x^*a^  * J y^+a?)] | i |*-y|  , V *,  y € F . 

However,  the  equation  x * / x^*l  (corresponding  to  a ■ 1)  has  no  solution.  On  the  other 
hand,  the  equation  * « \x\,  (corresponding  to  a • 0)  has  an  infinity  of  solutions,  and,  in 
fact,  a continuum  of  solutions. 

We  will,  therefore,  restrict  ourselves  to  the  following  class  of  operators. 

Definition  3: 

An  operator  F from  Fn  to  Fn  is  a contracting  operator  on  a subset  D of  Fn  if  it 
is  a Lipchitzian  operator  on  D with  a Lipchitzian  matrix  A such  that  p(A)  < I (where 
p(A)  is  the  spectral  radius  of  A). 

The  matrix  A will  be  called  a contracting  matrix  for  the  operator  F.  I 


The  fact  that,  unlike  Lipchitzian  operators,  contracting  operators  are  guaranteed  to 
have  a unique  fixed  point  in  the  subset  D can  be  easily  derived  from  the  definition.  In 
i addition,  if  we  assume,  for  example,  that  0 is  closed  and  that  F(D)  c D,  we  are  also 
guaranteed  of  the  existence  of  a fixed  point  in  the  subset  D.  A proof  can  be  found  in 
[5,  pp.  433-434]. 


: 3.2  • Examples  of  contracting  operators 


We  could  have  considered  a more  general  definition  for  asynchronous  iterative 
methods  by  introducing  a relaxation  factor  a > 0.  This  would  simply  consist  of  replacing, 

f l 

' in  equations'  (2.1),  the  operator  F by  the  operator  Fa  * uF  + (l-o)E,  where  E is  the 
identity  operator  of  f?tt.  It  follows  that 

\FJ*)-FJy)\  s o\F(xhF(y)\  ♦ (l-o) |*-y|  , 

and,  if  F is  a contracting  operator  with  a contracting  matrix  A,  F0  is  a Lipchitzian 
] operator  with  the  Lipchitzian  matrix  A0  - oA  ♦ |i-o|/.  The  matrix  A being  non-negative 
we  have  p(Au)  - op(A)  + |i-o|,  and,  if  we  choose 

0 < o < 2/[l*p(A)] , (3.2) 

E0  is  also  a contracting  operator.  In  particular,  as  long  as  condition  (3.2)  is  satisfied,  the 
results  of  the  next  section  also  apply  to  asynchronous  iterative  methods  with  relaxation. 


Let  F be  a linear  operator  given  by  F(x)  » Ax  * 6,  where  A is  an  nun  matrix  and  b 
is  a vector  of  IRn.  We  observe  that  F is  a contracting  operator  if  and  only  if  p(\A\)  < 1. 
Therefore,  in  the  case  of  linear  operators,  the  notion  of  contracting  operators  coincides 
with  the  property  stated  by  Chazan  and  Miranker  for  their  convergence  result  [1],  and 
their  result  will  appear  as  a particular  case  of  the  theorem  of  the  next  section. 

If  we  now  consider  a linear  system  of  equations  derived  from  a linear  elliptic 
differential  equation  by  the  method  of  finite  differences,  we  note  that  the  system  is 
represented  by  Ax  - b,  where  f>  is  a vector  of  IRn  obtained  from  the  boundary  conditions 
and  A is  an  nxn.  M matrix  (see,  for  example,  [7,  p.  85]).  Therefore  the  system  can  be 
written  into  the  form  of  equation  (1.1)  in  which  F is  the  contracting  operator  given  by 
F(x)  • (t  - D~*A)z  * 0"*6,  where  D is  the  matrix  composed  of  the  diagonal  elements  of  A . 
This  example  shows,  in  the  case  of  linear  operators,  the  importance  of  contracting 
operators. 

On  the  other  hand,  non-linear  contracting  operators,  too,  constitute  a very 
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important  class.  A first  example  is  directly  derived  from  the  previous  one.  Elliptic 
partial  differential  equations,  obtained  by  the  addition  of  a small  non-linear  perturbation 
to  a linear  partial  differential  equation,  can  also  be  shown  to  give  rise  to  (non-linear) 
contracting  operators. 

More  important,  if  C is  a non-linear  operator  from  IRn  into  itself  with  the  simple 
root  £,  superlinear  iterative  methods  have  been  devised  to  find  the  root  £ of  G,  provided 
that  an  initial  approximation  x(0)  sufficiently  close  to  £ is  already  Known.  For  example, 
Newton  iterative  method  generates  the  sequence  of  iterates 

x(i*l)  - F(x(i))  - x(i)  - [G'(x(i))Y^O(x(i))  , for  i » 0,  1, ...  , 
which  converges  quadratically  to  the  root  £ of  C.  In  this  particular  example,  we  can 
easily  derive,  under  usual  assumptions  (for  example,  C*  satisfies  some  Lipchitz  condition 
! in  a neighbor  of  £),  that  the  Newton  operator  F corresponding  to  C is  a contracting 
operator. 

y In  fact  this  result  is  very  general  Let  F be  an  operator  from  f?n  into  itself  with  a 
| fixed  point  £.  If  we  assume  that  F is  continuously  differentiable  in  the  set 
Dr  - { * | ||*-'£||  < r } and  that  the  derivative  P vanishes  at  £ and  satisfies  a Lipchitz 
condition 

, \\r<x)-P(x)\\  s MH*-y||  , ix,yCDr, 

then  it  can  be  easily  shown  that 

\\F(x)-F(y)U  2Mr\\x-y\\  , V *,  y € Dr  . 

Therefore,  by  choosing  the  vector  norm  ||*||  « | ♦ ...  ♦ |*rt|  (which  only  changes  the 

constant  M),  the  operator  F is  certainly  a Lipchitzian  operator  with  the  Lipchitzian  matrix 

1 -A  - [*ij)  where  a - 2Mr,  for  i,  j - 1 n.  In  particular,  if  we  know  a sufficiently  close 

approximation  to  the  fixed  point  £ (L  e.,  if  r is  small  enough),  the  operator  F is  also  a 

contracting  operator.  This  shows  that  the  class  of  contracting  operators  contains,  under 

weak  conditions,  all  iterative  functions  occurring  in  the  classical  superlinear  iterative 

methods. 

i 

j 

j 4 - Convergence  theorem 

I 

i 

Before  stating  a sufficient  condition  ensuring  the  convergence  of  an  asynchronous 
iteration,  we  give  a characterization  of  a non-negative  matrix  with  spectral  radius  less 
than  unity.  An  algebraic  proof  of  this  characterization  can  be  found  in  [1,  p.  2181  • 


. 
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We  are  now  able  to  state  a sufficient  condition  on  the  operator  F for  the 
convergence  of  any  asynchronous  iteration  corresponding  to  F.  This  result  is  similar  to 
the  results  obtained  for  the  convergence  of  a chaotic  iteration  by  Chazan  and  Miranker 
[1]  and  by  Miellou  [4J.  The  proof  given  here  follows  the  same  idea  as  in 
| [1,  pp.  217-218],  it  does  not  depend,  however,  on  the  assumption  that  (with  the  notation 
of  definition  1)  j-itfj)  is  uniformly  bounded  by  a fixed  integer,  for  any  j - 0,  1, ...  and 


Theorem  1 : 

If  F is  a contracting  operator  on  a closed  subset  D of  fRn  and  if  F(D)  c D,  then 
any  asynchronous  iteration  (F,x(0)J,/>)  corresponding  to  F and  starting  with  a vector 
x(0)  in  D converges  to  the  unique  fixed  point  of  F In  D. 


shorter  proof,  based  on  the  continuity  of  the  spectral  radius  of  a matrix  as  a function  of 
its  coefficients,  is  given  below. 

Lemma: 

Let  A be  a non -negative  square  matrix.  Then  p(A)  < I if  and  only  if  there 
exists  a positive  scalar  o and  a positive  vector  v such  that: 

Au  £ ov  and  o<i.  (4.1) 

Proof: 

We  first  assume  that  (4.1)  holds.  In  this  case  we  note  that  ||/?||v  s o < /,  where  the 
matrix  norm  ||.||„  is  induced  by  the  vector  norm  defined  by: 

11*11,,  - max[  \xLi/vL  | i m l, ... ,/»  } . 

Therefore  the  matrix  A is  convergent  which  implies  p(A)  < 1 (see,  for  example,  [7,  p.  13]). 

Now  assume  that  p(A)  < 1.  Let  t be  a non-negative  scalar  and  At  be  the  matrix 
obtained  by  adding  t to  all  null  coefficients  of  A Clearly,  for  any  positive  vector  *,  we 
have  Ax  s Atx.  On  the  other  hand,  p(At)  is  a continuous  function  of  t.  In  particular,  since 
Aq  « A and  p(A)  < l,  we  can  always  choose  t > 0 small  enough  so  that  p(At)  < 1 (in  fact, 
we  also  have  p(A)  s p(At)).  Then  let  o - p(At).  As  At  > 0,  from  Perron’s  theorem  (see,  for 
example,  [7,  p.  30]),  there  exists  a positive  eigenvector  v corresponding  to  the  eigenvalue 
o.  The  positive  scalar  o and  the  positive  vector  v verify  Av  & Atv  » ov  with  a < l.  And 
this  completes  the  proof.  | 

This  proof  shows,  in  particular,  that  o i p(A).  But,  we  also  see  easily  that  the 
positive  scalar  a can  be  chosen  arbitrarily  close  to  p(A). 


-9  - 


Proof: 


i Let  f be  the  unique  fixed  point  of  F.  By  considering  the  operator  F(x*f)-£,  we  may 
assume,  without  loss  of  generality,  that  f ■ F( ()  • 0.  By  setting  y • f In  equation  (3.1), 
the  Lipchitz  condition  on  the  operator  F gives: 

\F(x)\iA\m\,  VstO. 

i 

Let  A be  a contracting  matrix  tor  F end  let  u and  v be  as  defined  in  the  lemma. 

I Since  v is  a positive  vector,  for  any  storting  vector  »(Q)  we  can  find  a positive  scalar  a 
such  that  \x(0)\  i on/. 

We  will  show  that  we  can  const-  i^i  a a er«venre  of  indicos  P m 0,  i,  such  that 
the  sequence  of  iterates  of  (FjriQ)Jf^ J)  satisfies: 

|* O') I * ao**  for  / » - ^,2) 

As  0 < o < I,  this  shows  that  *(j)  ■*  0 as  > «*>  and  this  will  prove  the  theorem. 

We  first  show  that  inequality  (4.2)  holds  for  p - 0 if  we  choose  j0  - 0.  That  is,  for 
j it  0 we  have: 

|*0')|  * • (4,3> 

From  the  choice  of  oc,  inequality  (4.3)  is  true  for  j • 0.  Assume,  for  induction,  that 
it  is  true  for  0 s j < k and  consider  x(k).  Let  i denote  the  vector  with  components 
z;  . x (s;(k))t  for  i - l, ... , n.  From  definition  1,  the  components  of  x(k)  are  given  either 
by  Xj(k)  m xL(k-l)  if  i £ in  which  case  \%j(k)\  • |*^t-l)|  s ocvit  or  by  xtfk)  - //*)  if 
i C Jfo.  In  this  latter  case,  we  note  that,  as  tj(k)  < k (condition  (a)  of  definition  1 ),  we 
have: 

\F(z)\  s A\z\  s a/4 v s aov 
and  in  particular: 

)Xj(k)\  - |//z)|  i ocavL . 

As  0 < o < /,  in  this  case  too  we  obtain  \xt(k)\  i ai/^  and  (4.3)  is  proved  by  induction, 
which  shows  that  (4.2>  is  true  for  p - 0 if  we  choose  y0  - 0. 

Now  assume  that  jp  has  been  found  and  that  inequality  (4.2)  holds  for  0 * p <q.  We 
want  to  find  jq  and  show  that  (4.2)  also  holds  for  p - q. 

First  define  r by 

r » min{  k | V j z k ttfj)  z jq_i  , for  i • 1, ... , n } . 

We  see,  from  condition  (b)  of  definition  1,  that  this  number  exists,  and  we  note  that,  from 
condition  (a),  we  have  r > jq_ j which  shows,  in  particular,  that  |*(r)|  i ac/f  V 
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Then  take  j z r and  consider  the  components  of  x(j).  As  above,  let  z be  the  vector 
with  components  zL  - xfifi)).  From  the  choice  of  r,  we  have  s±(j)  z jq_ j,  for  i a, 

and  this  shows  that  |/|  i In  particular,  In  using  the  contracting  property  of  the 

operator  F we  obtain: 

\F(z)\  & A\z\  £ occfl'^Av  i uaP v . 

This  inequality  shows  that,  if  i C Jj,  x-t(j)  satisfies: 

I *i<J)\  - W*)\  s oco%i . 

On  the  other  hand,  if  i £ Jj  the  i-th  component  is  not  modified.  Therefore,  as  soon  as  the 
i-th  component  is  updated  between  the  r-th  and  the  y'-th  iteration  we  have: 


(4.4) 


Now,  define  jq  as: 

jq  • min{  j \ j ir  and  {i n)  ~ Jr  U ...  U Jj  } 

(this  number  exists  by  condition  (c)  of  definition  1>,  then  for  any  j i jq  every  component 
is  updated  at  least  once  between  the  r-th  and  the  y'-th  iteration  and  therefore  inequality 
; (4.4)  holds  for  i - 1, ... , n.  This  shows  that  inequality  (4.2)  holds  for  p » q and  this 
proves  the  theorem.  I 


5 - The  class  of  asynchronous  iterative  methods  with  memory 

The  idea  behind  the  definition  of  asynchronous  iterations,  as  presented  in  section 
2,  is  to  allow,  in  the  evaluation  of  Fix),  different  (and  independent)  processes  to  compute 
: different  subsets  of  the  components.  This  corresponds  to  a natural  decomposition  for  the 
evaluation  of  Fix)  when  the  operator  F is  known  explicitely  by  the  set  of  functions 
/j,  ...  , fn.  This  is  not,  however,  always  so.  For  example,  if  F is  the  Newton  operator 
corresponding  to  a non-linear  operator  C,  L e.:  Fix)  * x - [G'ix^’^Gix),  usually  only  the 
operator  C is  given  and  the  operator  F is  not  known  explicitely.  In  this  particular  case, 
when  two  processors  are  available,  a more  natural  decomposition,  as  proposed  by  Kung 
in  [3],  is  to  have  one  process  computing  the  value  of  C’  while  the  other  process  uses  this 
value  for  the  evaluation  of  F.  More  precisely,  if  * and  y are  two  global  variables 
containing  the  current  values  of  the  iterate  and  of  the  reciprocal  of  the  derivative  of  C, 
respectively,  the  two  processes  correspond  to  the  two  following  programs. 
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Process  1:  while  (termination  criterion  not  satisfied) 

do  x x - yxC(x). 

Process  2:  while  (termination  criterion  not  satisfied) 

do  y (GY*)]"*- 

Starting  with  the  initial  values  x(0)  and  [G'feflW)]"1  for  x and  y respectively,  the 
two  processes  execute  their  programs  asynchronously  and  use  for  x and  y whatever 
values  are  currently  available  when  needed.  They  implicitely  define  the  sequence  of 
iterates  x(j) , for  j * 0,  1, ...,  through  formulas  of  the  form: 

x(j)  - H[x(j-l),x(kj)] , with  kjij-l,  (5.1) 

where 

H(x,y)  * x - [ C,(y)]~^G(x)  . 

This  iteration,  however,  is  not  allowed  in  the  setting  of  definition  1,  because,  in  equation 
(5,1),  x(j)  is  defined  in  terms  of  two  previous  iterates.  And  this  motivates  the  need  for  a 
generalization  of  the  class  of  asynchronous  iterative  methods. 


5.1  - Asynchronous  iteration  with  memory 

A generalization  to  definition  1 can  be  obtained  by  noting  that,  if,  for  j » 2,  3, ...,  it 
happens  that  kj  - j-2  in  equation  (5.1),  this  equation  defines  a sequence  of  iterates  which 
corresponds  exactly  to  the  sequence  generated  by  an  iterative  method  with  one  memory. 
: This  remark  suggests  the  following  generalization  for  the  problem  stated  in  equation 

(1.1). 

i 

Given  an  operator  F from  [iRrtJ'rt  into  IRn,  the  problem  is  now  to  find  a vector  £ in 


tRn  such  that: 

£ « lim 


{x1-*! xn-*S) 


F(xl,...,xm) . 


The  vector  £ will  still  be  called  a fixed  point  for  the  operator  F. 


(5.2) 


In  very  much  the  same  way  as  we  introduced  the  class  of  asynchronous  iterative 
methods  to  solve  equation  (1.1),  we  now  introduce  the  class  of  otynehronout  iterative 
methods  with  memory  to  solve  eevation  (5.2). 

Definition  4: 


Let  F be  an  operator  from  into  An  asynchronous  iteration  with 


memory  corresponding  to  the  operator  F and  starting  with  a given  set  of  vectors 
x(0), ...,  x(m-l)  is  a sequence  x(j),  j • 0,  l, ...,  of  vectors  of  #?ft  defined  for 


j ® m,  m+l, ...  by: 


...  f 

' ’ l ffi' 


if  iFJj 
zm)  if  i € Jj  , 


where  zr,  1 z r z m,  is  the  vector  with  components  z[  - x^tf(j)),  1 si  in.  As  in 

definition  1,  £ - { Jj  | j « m,  m+l, ...  } is  a sequence  of  non-empty  subsets  of 

{1, ...  , n}  which  correspond  to  the  subsets  of  components  evaluated  at  each  step  of 
the  iteration.  But  the  sequence  A is  now  to  be  replaced  by: 

A m { (s^Q), ...,  tnl(j),  tj2(j), ....  snm(j)>  | j * m,  m+l, ...  } , 
a sequence  of  elements  in  [(Wn]'n.  In  addition,  while  condition  (c)  of  definition  1 
remains  the  same,  conditions  (a)  and  (c)  now  become: 
for  each  i « 1, ... , n 

(a)  max{  sf( j)  \ 1 i r z m ) £ j-1,  for  j » m,  m+l, ..., 

(b)  min{  *f(j)  | 1 z r z m } z j-1  tends  to  infinity  as  j tends  to  infinity. 

An  asynchronous  iteration  with  memory  corresponding  to  F,  starting  with  a set 
X of  m vectors  and  defined  with  J and  A will  be  denoted  by  (F,X,},A).  I 

For  practical  reasons  (e.  g.,  stability  in  the  implementation  on  a computer),  we 
might  want  to  have  the  additional  condition  that  the  vectors  zl,  ...,  zn  are  alt  distinct 
But  this  restriction  is  not  essential  for  our  purpose  here  if  we  assume,  for  example,  that 
the  operator  F is  defined  by  continuity  when  two  or  more  vectors  are  identical.  This  will 
be  the  case  with  the  class  of  operators  we  wilt  consider. 

Now,  in  order  to  obtain,  for  asynchronous  iterations  with  memory,  a convergence 
result  similar  to  the  result  stated  in  theorem  1,  we  need  to  generalize  the  notion  of 
contracting  opera  to;  s to  operators  from  [#?n],n  into  IRn. 

In  the  remainder  of  the  section,  we  will  use  the  following  notation.  If  {x*, ...»  xm] 
is  a set  of  vectors  in  I?*,  z » max[**, ...,  xm)  denotes  the  vector  in  f?a  with  components 
/.  . max{  | I z r z m }.  A natural  generalization  to  the  notion  of  contracting 
operators  is  given  in  the  following. 


Definition  5: 


An  operator  F from  into  I?"  is  an  m-eontraeting  operator  on  a subset  D 
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of  #?ft  if  there  exists  a non-negative  n*n  matrix  A with  spectral  radius  lets  then  unity 
satisfying,  for  all  * 1,  ...  xm,  y1,  -.,  ym  in  0, 

\F(x*,  xm)  - F(yl,  ym>|  s Amax[|**-y*|, |*ffl-ym|] . 

i 

The  matrix  A will  be  called  a contracting  matrix  for  the  operator  F.  | 

When  m m i,  the  preceding  definition  corresponds  exactly  to  definition  3.  And 
m -contracting  operators  have  all  the  properties  we  have  already  mentioned  for 
• contracting  operators.  In  particular,  it  is  clear  from  the  definition  that  m -contracting 
operators  are  continuous  and,  in  fact,  uniformly  continuous  on  Dm.  The  uniqueness  of  a 
fixed  point  in  D is  also  easily  derived.  In  addition,  if  we  assume  that  0 is  a closed 
■ subset  of  (Rfl  satisfying  F(Drn)  c 0,  then  we  are  guaranteed  of  the  existence  of  a fixed 
point  in  D:  the  fixed  point  is,  for  example,  obtained  as  the  limit  of  the  sequence  x(j), 
j » 0,  f, ...,  defined  by: 

x(j)  = F(x(j-1), ...,  x(j-m)) , j * m,  m*l, ... , 
which  is  independent  of  the  set  of  starting  vectors  x(0), ....  x(m-l)  in  0. 

We  are  now  able  to  state  the  analogue  of  theorem  1 for  /n-contracting  operators  in 
the  following. 

Theorem  2: 

If  F is  an  m-contracting  operator  on  a closed  subset  D of  IRn  satisfying 
F(Dn)  c D,  then  any  asynchronous  iteration  with  memory  corresponding  to  the 
operator  F and  starting  with  an  arbitrary  set  of  m vectors  in  D converges  to  the 
unique  fixed  point  of  F in  D. 

Proofs 

With  slight  modifications,  the  proof  of  this  theorem  is  identical  to  the  proof  of 
theorem  1.  | 

5.2  - Examples  of  asynchronous  iterations  with  memory 

In  the  beginning  of  this  section,  we  considered  the  Asynchronous  Newton's  method 
to  find  the  simple  root  T of  a non-linear  operator  C.  This  method  led  to  the  sequence  of 
iterates  generated  by  the  asynchronous  iteration  with  memory  (H,{x(0),x(0)]J,/6),  where: 

J j * { f , «•*,  ft}  for  / * 2,  3,  ... , 

tj(j)  • j-1  , sf(j)  m kj  for  j m 2,  3, ...  and  i ■ I, ...,  n . 
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In  addition,  as  the  operator  H can  easily  be  shown  to  be  a 2-contracting  operator 
(assuming,  for  example,  some  Lipchitz  condition  for  the  derivative  of  G in  a small 
neighbor  of  the  root  f),  we  see  that  the  sequence  defined  by  equation  (5.1)  converges  to 
S%  provided  that  kj  tends  to  infinity  with  j (which  simply  states  the  fact  that  the 
processes  eventually  complete  each  step  of  their  computations). 

Let  F be  an  operator  from  [IRn]m  into  IRn,  and  let  o be  a positive  scalar.  Consider 
the  operator  F0  from  into  IRn  obtained  f.  m the  operator  F by  the  introduction 

of  the  relaxation  factor  o,  and  defined  as 

F(x * *, ....  xm)  » (l-o)x°  * oF(xl, ....  xm) . 

We  first  note  that  both  F and  F have  the  same  fixed  points  (if  any).  We  also  note  that, 
if  F is  an  m-contracting  operator  on  some  subset  0 of  IRn  with  the  contracting  matrix  A, 

| then,  for  all  x x* x n,  yP,  yl, ym  in  D,  the  operator  F0  satisfies: 

\FJx° rm)-FJy°,  ...,  ym)\  £ \l-o\\x0-y°\  ♦ o| F(xl, ....  xmhF(yl ym)\ 

| £ ♦ o/lniax[|a:*-y*|, .... 

£ [|J-o|/  ♦ o/l]max[|*°-y°|,  ....  \xm-ym\] , 

| 

, and,  provided  that  0 < a < 2/[l*p(A)l  Fa  is  an  (nt+f  ^-contracting  operator  on  D with  the 
contracting  matrix  Aa  » |i-o|/  ♦ oA  This  reestablishes,  in  a more  general  setting,  the 
result  mentioned  in  section  3.2  for  asynchronous  iterative  methods  with  relaxation. 

Many  more  examples  of  asynchronous  iterations  with  memory  can  be  given  and,  in 
particular,  all  classical  iterative  method  with  memory  can  be  expressed  in  this  way.  In 
addition,  all  usual  super -linear  iterative  methods  with  m memories  can  be  shown  (under 
weak  conditions)  to  correspond  to  some  On+iJ-contracting  operator,  therefore  ensuring 
: the  convergence  of  any  asynchronous  iterations  corresponding  to  this  operator. 


6 - On  the  complexity  of  asynchronous  iterations 

i ! 

Let  F be  an  operator  from  f?n  to  itself  with  a fixed-point  f and  satisfying  the 
assumptions  of  theorem  1.  We  now  investigate  some  measures  of  efficiency  for  the 
convergence  of  the  asynchronous  iteration  (F,x(0),J,/>)  toward  the  fixed-point  £ of  F. 

The  constructive  proof  of  the  theorem  already  provides  us  with  bounds  for  the 
error  vector  x(j)  - jf.  And,  in  fact,  if  F is  a contracting  operator  with  the  contracting 
matrix  A,  we  note  that  an  estimate  of  the  error  committed  with  the  asynchronous  Iteration 
(F,x(0)£,A)  is  directly  obtainable  from  the  asynchronous  iteration  (A,\x(Ohg\J,*d).  This 
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estimate  is  used  in  this  section  to  derive  bounds  for  the  efficiency  of  asynchronous 
iterations  corresponding  to  contracting  operators.  However,  since  c*n 

only  reflect  linear  convergence,  this  estimate  is  certainly  not  adequate  to  deal  with  all 
asynchronous  iterations,  and,  in  section  8,  using  an  example,  we  present  an  analysis  for 
an  asynchronous  iteration  with  super -linear  convergence. 

For  convenience,  we  only  consider  the  convergence  in  norm  of  the  error  vector 
»<j)  - £.  By  choosing,  for  example,  the  norm  ||x||  - max{  |xj  | i » /, ....  n },  this 
corresponds  to  the  worst  possible  case  for  the  convergence  of  the  components. 

To  measure  the  linear  convergence  of  the  sequence  x(j),  j • 0,  1, ...,  toward  its  limit 
jf,  we  consider  the  following  complexity  measures  often  referred  to  in  the  literature.  The 
rate  of  convergence  of  the  sequence  is  defined  as: 

H - Urn  inf  [f-log||*0>ftlV>]  • 

In  addition,  if  Cj  is  the  cost  associated  with  the  evaluations  of  the  first  j iterates, 
x(l>, ....  x(j),  we  define  the  efficiency  of  the  sequence  by: 

£ - lim  inf  [Mog||*0>fllVc;]  • 

If  all  logarithms  are  taken  to  the  base  10,  1/&  measures  the  asymptotic  number  of  steps 
required  to  divide  the  error  by  a factor  of  10,  whereas  1/E  measures  the  corresponding 
cost  We  note  that,  if  cj/j  tends  to  some  finite  limit  c (which  corresponds  to  the  average 
cost  per  step),  then  the  efficiency  is  simply  given  by  £ - £/e. 


The  costs  Cj,  j - 1,2, ...,  can  be  chosen  according  to  any  convenient  measure.  In 
our  case,  we  consider  the  cost  to  correspond  either  to  the  number  of  evaluations  of  the 
operator  F,  or  to  the  time  to  perform  the  evaluations.  In  the  former  case,  if  each 
component  is  equally  as  hard  to  compute,  the  cost  can  be  directly  evaluated  from  the 


sequence  by  considering 

cj  m (\Jj\  ♦ ...  ♦ \Jj))/n  , (6.1) 

where  \Jj\  is  the  cardinality  of  the  set  Jj,  l e.,  the  number  of  components  evaluated  at 
i the  y-th  step  of  the  iteration.  In  the  latter  case,  the  cost  is  better  suited  to  deal  with 
■ parallel  algorithms,  and  can  be  evaluated  through  the  classical  tools  of  queueing  theory. 
! When  it  is  necessary  to  indicate  which  cost  measure  is  used  in  the  evaluation  of  the 
efficiency,  we  use  the  notations  £#  if  the  cost  is  measured  in  number  of  evaluations  of  F, 
and  £t  if  the  cost  is  measured  by  the  time  needed  to  perform  (sequentially)  one 


evaluation  of  F. 


f 


6.1  - General  bound* 


We  return  to  the  proof  of  theorem  1,  and  we  u»e  the  same  notations.  The  proof 

! ; 

simply  consists  of  constructing  an  increasing  sequence  of  indices  jp,  p • 0,  t,  -., 
satisfying 

||*0)  - S II  * oeoP  for  j*  jp, 

, where  the  positive  constant  ot  can  be  taken  to  be  ot  - ll>  From  the  construction  of 

: this  sequence  we  note  that 

Jp*i  - Jp  *rp*lp  for  P”0’1’  * 

where  rp  and  tp  are  integers  chosen  to  satisfy:  (1)  starting  with  the  index  jp*r  pt  «U 
evaluations  of  iterates  do  not  make  any  more  use  of  values  of  components  corresponding 
to  iterates  with  indices  smaller  than  jp\  and  (2)  all  components  are  evaluated  at  least 
once  between  the  (jp*rp)-th  and  the  (jp*rp*tp)-fh  iterates. 

Now  let 

Pj  - sup{  p | i‘o*t0*...+rp_l+tp_1  i j } for  j - 0,  1,  ... . (6.2) 

Then,  if  we  know  rp  and  tp  for  p « 0,  l we  can  deduce  a bound  on  ||*0)~?ll  since 

||*0>Jll  s otoP j for  j m 0,  l, ... , 

which  shows  that  the  sequence  »(j),  j ■ 0,  / converges  at  least  as  fast  as  the  sequence 

P ' 

o J,  j m 0,  1, ...,  with  a rate  of  convergence  H such  that 
£ * - [lim  inf  ^ (p j/j)]  logo  . 

And,  if  c j is  the  cost  associated  with  the  evaluations  of  the  first  j iterates,  we  have  the 
following  bound  for  the  efficiency: 

£ i - [lim  inf  (pj/cj)]  logo  . 

In  addition,  as  was  noticed  earlier,  if  A is  a contracting  matrix  for  the  operator  F,  o can 
be  chosen  arbitrarily  close  to  p(A).  This  shows  that  in  the  bounds  we  have  just  obtained 
we  can  simply  replace  o by  p(A),  and  this  yields  the  following. 

Theorem  3: 

Let  F satisfy  the  condition  of  theorem  1,  and  let  A be  a contracting  matrix  for 
the  operator  F.  Then  the  asynchronous  iteration  (F,%(Q)J,/S)  converges  to  the  fixed 
point  of  F with  a rate  of  convergence 

& * - [lim  inf  jHoq  (p j/jfyogp(A)  , 
and  an  efficiency 

£ 4 - [lim  inf  (p  j/e  j)^ogp(A)  , 

where  the  sequence  pi  is  defined  from  and  A by  equation  (6.2). 
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An  example 
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A*  an  Illustration,  we  consider  the  parallel  implementation  of  Jacobi's  method  with 
Ac  processes.  For  simplicity,  we  assume  that  n is  a multiple  of  ft,  and  we  set  q » n/k. 

» 

' 

To  avoid  an  overhead  in  the  selection  of  the  components  to  be  updated  at  each 
| step  of  the  iteration,  each  process  is  assigned  to  the  evaluation  of  a fixed  subset  of  the 
components.  In  particular,  when  all  components  are  equally  as  hard  to  compute,  and 
i when  all  processors  are  equally  as  fast,  it  is  natural  to  decompose  the  set  of  components 
I into  subsets  of  equal  sizes,  and,  for  example,  to  assign  the  first  process  to  the  evaluation 
i of  the  first  q components,  the  second  process  to  the  evaluation  of  the  next  q components, 
and  so  forth.  Corresponding  to  this  decomposition,  a parallel  implementation  of  Jacobi’s 
i method  with  k processes  can  be  represented  by  the  asynchronous  iteration  <F,%(Q)J,A. JJ, 
where  and  xJ  are  defined  by: 

Jj  m { i | t ♦ (j-i  mod  k)qiiiq  ♦ (j-i  mod  k)q  } for  / « i,  2, ...  , 

•i<J)  - 10-IVkJq  for  j m 1,  2, ...  and  i - /, ....  n . 

| The  two  asynchronous  iterations  we  introduced  in  section  2.2  to  represent  Jacobi’s 
method  correspond  to  the  particular  cases  k - i and  k - n. 

It  is  easy  to  check  that  rp  and  tp  are  given  by  i and  ft,  respectively,  for  p - 0,  1 

This  shows  that  pj  - (j/fcJ  and  therefore 
K(k)  z -Oagp(A))/k . 

Now,  if  c j measures  the  number  of  evaluations  of  F required  to  compute  the  firs*  j 
iterates,  using  equation  (6.1),  we  have  cj  • //ft.  This  gives  for  the  efficiency: 

Em<k)  * -Ciogpm . (6  3) 

For  all  valuos  of  ft,  we  obtain  the  same  bound  for  the  efficiency.  In  particular,  when  F is 
the  linear  operator  defined  by  F(x)  - Ax  * 6,  where  A is  a non-negative  run  matrix  with 
spectral  radius  less  than  unity,  then  A can  be  chosen  as  a contracting  matrix  for  F and 
the  bound  (6.3)  Is  known  to  be  sharp. 

Since  the  asynchronous  iteration  we  are  considering  corresponds  to  a parallel 
implementation  of  Jacobi’s  method,  instead  of  measuring  the  cost  by  the  number  of 
evaluations  of  F,  it  is  more  natural  to  use  the  average  time  to  perform  the  evaluations  as 
a measure  of  the  cost.  Let  the  time  unit  be  the  average  time  to  perform  (sequentially) 
one  evaluation  of  F.  Then,  if  pkijt(p*l)k,  we  have  * c^i)k 

epk  - p[Xk/k].  The  expression  \k/k  corresponds  to  the  lime  for  the  ft  processes  to 
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execute  in  parallel  their  computations  and  to  synchronize  their  executions.  The  factor  \k 
is  the  penalty  factor  mentioned  in  [3J  it  measures  the  overhead  due  to  the  fluctuations  in 
the  computing  times  of  the  k processes,  and  can  be  evaluated  if  we  know,  for  example, 
the  distribution  function  for  the  time  to  evaluate  F.  In  particular,  we  have  « i and,  for 
*i2,  \k  i 1 with  the  equality  only  when  it  always  take  the  same  constant  time  to 
evaluate  F (i.  e.,  there  are  no  fluctuations  in  the  computing  time).  This  cost  measure 
yields  the  following  bound  for  the  efficiency: 

Et(k)  * -[k/\k)\ogp(A) . 

Again,  these  bounds  are  sharp  for  the  linear  operator  we  mentioned  above,  and  the  ratio 
Et(k)/Et(l)  - k/Xk  measures  the  speed-up  achieved  by  using  a parallel  implementation 
with  k processes.  We  would  expect  the  implementation  with  k processes  to  be  k times  as 
efficient  as  the  sequential  implementation  (with  k » /),  but  this  is  not  so  because  of  the 
overhead  introduced  by  synchronizing  the  k processes  and  measured  by  the  penalty 
factor  \k. 


■ 


I 1 


6.2  - Additional  assumptions 

In  the  preceding  example,  we  have  been  able  to  carry  out  the  analysis  for  Jacobi  $ 
method  (and  even  obtain  sharp  bounds  on  the  efficiency)  because  the  representation  in 
terms  of  asynchronous  iterations  is  known  explicitly  and  follows  a very  regular  pattern. 
This  is  not,  however,  generally  so.  For  example,  in  a parallel  implementation  with 
several  processes  using  no  synchronization  (as  presented  in  section  2.1),  the  sequences 
/6  and  (and,  therefore,  the  sequences  rp  and  tp,  p » 0,  I, ...)  are  not  known  directly  but 
; are  only  defined  implicitly  by  the  processes  in  the  course  of  their  executions. 

I 

i Below,  we  present  alternate  bounds  for  R and  E under  conditions  often  satisfied  in 

usual  implementations  of  asynchronous  iterations.  We  assume  that  we  know  bounds  on  rp 
and  tp,  and  we  restrict  the  definition  of  the  class  of  asynchronous  iterative  methods  by 
replacing  conditions  (b)  and  (c)  of  definition  1 with  the  following: 

(b’)  There  exists  a positive  integer  r such  that,  for  / ■ 1,  2, ...  and  i • i, ...»  ft* 
s/J)  * i-r, 

(c’)  there  exists  a non-negative  integer  t such  that,  for  /' • 1, 2, ..., 

J j U ...  U J j+l  ■ {/,  •••#  ft}- 

Condition  (b’)  corresponds  exactly  to  the  restriction  stated  by  Chazan  and  Miranker  in  the 
definition  of  the  chaotic  relaxation  scheme  [1].  We  have  criticized  the  condition  for  the 

| 
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generality  of  the  definition,  and  we  have  shown  that  this  condition  was  not  necessary  to 
ensure  the  convergence  of  asynchronous  iterations.  In  practical  applications,  however, 
this  condition  Is  often  satisfied,  in  particular,  when  the  computations  of  all  components 
have  the  same  complexity  (which  is  the  case  with  a linear  operator).  Condition  (c*)  is 
also  satisfied  for  most  of  the  usual  implementations  of  asynchronous  iterations,  since  it  is 
natural  that  (1)  a process  evaluates  a component  by  using  the  most  recently  updated 
values  of  all  components;  and  (2)  two  processes  never  evaluate  the  same  component  at 
the  same  time;  in  this  case  it  follows  directly  that,  by  taking  r - t+l,  conditions  (b’)  and 
(c*)  are  equivalent 

Under  the  additional  conditions  (b*)  and  (e’),  we  clearly  have  r and  tp  s t,  for 
p » 0,  I, «..,  and,  therefore,  pj  i [j/(r+t)\ . From  the  bounds  stated  in  theorem  3,  we 
immediately  obtain  the  following. 

Corollary: 

Let  F satisfy  the  condition  of  theorem  1,  and  let  A be  a contracting  matrix  for 
F.  If  the  asynchronous  iteration  (F,x(0)J,/6)  satisfies  the  additional  conditions  (b’> 
and  (c’>,  then  it  converges  to  the  fixed  point  of  F with  a rate  of  convergence 
* a - [l/(r*t»  log p(A) , 
and  an  efficiency 

£ a - (lim  j/(r*tkj]  log  p(A)  , 

where  the  sequence  pj  is  defined  from  f and  xJ  by  equation  (6.2). 

7 - Experimental  result* 

Several  asynchronous  iterations  have  been  experimented  with  on  C.mmp,  the 
Carnegie -Mellon  multiprocessor  [83,  and  the  actual  measurements  are  presented  in  the 
next  section.  The  different  asynchronous  iterative  methods  are  described  below. 

7.1  - Asynchronous  iterations  experimented  with 

All  asynchronous  iterations  we  have  experimented  with  consist  of  the  parallel 
execution  of  k processes.  As  we  did  with  the  parallel  implementation  of  Jacobi’s  method, 
we  assign  to  each  of  the  processes  the  evaluation  of  a fixed  subset  of  the  components. 
Each  process  computes  cyclically  new  values  for  the  components  in  its  subset,  and  the 
methods  only  differ  by  the  choices  of  the  values  used  in  the  evaluations. 


Asynchronous  Jacobi’s  method  (AJ):  For  the  evaluations  of  all  components,  a process 
uses  only  values  of  the  components  Known  at  the  beginning  of  a cycle,  and  the 

i 

process  releases  all  new  values  at  the  end  of  each  cycle. 

, Asynchronous  Gauss -Seidel’s  method  (AGS):  Same  as  the  AJ  method  except  that  the 
process  uses  new  values  of  the  components  in  its  subset  as  soon  as  they  ere 
Known  for  further  evaluations  in  the  same  cycle.  Again,  it  releases  the  new 
values  (for  the  other  processes)  at  the  end  of  its  cycle. 

, Purely  Asynchronous  method  (PA):  A process  computes  the  new  values  of  each 
component  by  using  the  most  recent  values  of  alt  components  and  releases  each 
new,  value  immediately  after  its  evaluation. 

j i 

The  PA  method  is  certainly  the  easiest  method  to  implement,  and,  as  far  as  space  is 
concerned,  is  clearly  the  most  efficient  one,  whereas  the  AJ  method  is  the  worst  one, 
since  it  requires  from  each  process  not  only  a complete  duplication  of  all  components  (as 

; of  the  beginning  of  its  cycle)  but  still  another  copy  of  the  components  in  its  own  subset. 

i 

| This  can  hardly  be  justified  but  experimental  results  give  useful  comparisons  between 

I 

] the  AJ  method  and  the  actual  Jacobi’s  method  (also  between  the  AGS  and  Gauss -Seidel’s 

; methods). 

In  addition,  both  the  AJ  and  AGS  methods  also  require  the  need  for  a critical 
section  in  order  to  read  all  components  at  the  beginning  of  a cycle  and  to  update  the 
values  at  the  end  of  a cycle,  whereas  no  critical  section  is  needed  with  the  PA  method. 
However,  C.mmp  has  the  drawbacK  that  no  indivisible  instructions  exist  to  read  or  write 
floating  point  numbers  (implemented  on  two  consecutive  words  of  memory),  therefore,  if 
we  are  to  implement  the  PA  method  on  C.mmp,  only  the  first  8 bits  of  the  mantissa  can  be 
considered  significant,  and  the  admissible  error  in  the  termination  criterion  has  to  be 
chosen  accordingly. 

7.2  - Results 

The  three  methods  just  described,  as  well  as  Jacobi’s  method,  have  been 
implemented  on  C.mmp  to  solve  the  Dirichlet  problem  for  Laplace’s  equation  on  a 
rectangular  domain  of  ft2.  Using  the  method  of  finite  differences,  an  approximate 
solution  to  this  problem  can  be  found  by  solving  a linear  system  of  equations.  In  the 
experiments  reported  here,  a regular  grid  has  been  chosen  with  21x24  Interior  points, 
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resulting  in  a linear  system  of  size  n « S04.  This  system  can  be  represented  in  the  form 
* » F(x)  - Ax  ♦ b,  where  the  vector  b is  obtained  from  the  boundary  conditions,  and  the 
matrix  A is  a (very  sparse)  non-negative  matrix  with  spectral  radius  p(A)  * 0.991.  Since 
p(\A\)  - p(A)  < l,  this  shows  that  A is  a contracting  matrix  for  the  operator  F,  and, 
therefore,  that  the  result  of  theorem  1 can  be  applied  to  F to  ensure  the  convergence  of 
each  iterative  method. 

At  the  time  the  measurements  have  been  taken,  the  configuration  of  C.mmp  included 
six  processors,  and  all  iterative  methods  have  bean  run  with  a number  of  processes 
k - 1,  2,  3,  4,  and  6.  Each  of  the  results  reported  here  is  the  average  of  three 
measurements,  but,  since  C.mmp  was  used  in  stand-alone  during  the  experiments,  very 
little  difference  was  noted  from  one  run  to  the  next 

In  table  1,  we  report  for  the  four  methods  the  average  number  of  vector 
evaluations  required  to  reduce  (asymptotically)  the  error  vector  by  a factor  of  10:  this 
corresponds  to  the  cost  measure  l/Ee.  And,  in  table  2,  we  report  the  average  time 
(expressed  in  seconds)  required  to  achieve  this  reduction:  this  corresponds  to  the  cost 
| mesure  1/Et. 

The  bounds  obtained  from  the  results  of  the  previous  sections  are  mentioned  in 

l 

parentheses  along  with  the  measurements.  The  parameters  in  these  bounds  have  been 
' evaluated  either  directly  (e.  g.,  p(A)  a 0.991),  or  through  measurements  by  tracing  the 
executions  of  the  processes.  In  particular,  for  the  AJ,  AGS  and  PA  methods,  the  bounds  r 
and  t,  defined  in  section  6.2,  have  been  determined  by  observing  the  sequencing  of  the 
tasks  performed  by  the  different  processes.  Similarly,  the  penalty  factor  in  Jacobi’s 
| method  and  the  overhead  due  to  the  critical  section  in  the  AJ  and  AGS  methods  have  been 
1 obtained  by  direct  measurements:  they  are  presented  in  tables  3 and  4. 


1 

1 

Jacobi 

AJ 

AGS 

PA 

K « 1 

254  (254) 

254  (254) 

127  (254) 

127  (254) 

k -2 

254  (254) 

266  (888) 

142  (888) 

127  (762) 

k -3 

254  (254) 

267  (846) 

149  (846) 

127  (762) 

k -4 

254  (254) 

273  (825) 

166  (825) 

129  (762) 

k -6 

254  (254) 

285  (804) 

196  (804) 

128  (762) 

Table  1 - Number  of  evaluations  required  to  divide  the  error  by  a factor  of  10 
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Jacobi 

AJ 

AGS 

PA 

i 

K - 1 

337  (337) 

337  (337) 

168  (337) 

168  (337) 

, 

K-2 

241  (241) 

211  (705) 

113  (705) 

84  (506) 

k -3 

178  (178) 

149  (471) 

83  (471) 

56  (337) 

k - 4 

153  (153) 

123  (372) 

75  (372) 

43  (253) 

k — 6 

131  (131) 

102  (289) 

70  (289) 

28  (169) 

Table  2 - Time  required  to  divide  the  error  by  a factor  of  10 


• 

k - 1 

k-2 

k -3 

k -4 

k - 6 

*k 

1 

1.43 

1.59 

1.82 

2.34 

7. 

0 

29.9 

37.1 

45.1 

57.3 

Table  3 - Penalty  factor  with  Jacobi’s  method 
and  percentage  of  the  wasted  time 


k - 1 

k-2 

k -3 

k - 4 

k ■ 6 

*k 

1 

1.20 

1.26 

1.35 

1.62 

7. 

0 

16.6 

20.8 

26.0 

38.2 

Table  4 - Critical  section  overhead  cost  with  the  AJ  and  AGS  methods 
and  percentage  of  the  wasted  time 

These  results  must  only  be  considered  to  illustrate  the  behavior  of  asynchronous 
iterations,  since,  in  particular,  the  two  cost  measures  reported  in  tables  1 and  2 strongly 
depend  on  both  the  problem  (L  e.,  the  matrix  A)  and  the  multiprocessor  system.  Yet,  they 
show  a clear  advantage  of  asynchronous  methods  over  synchronized  methods. 

We  note,  for  example,  from  table  3 that,  with  Jacobi’s  method,  when  k • 6 
processes  are  used,  the  penalty  factor  is  as  big  as  * 2.34.  This  means  that  about  57 
percent  of  the  time  is  spent  by  a process  waiting  for  the  other  processes  to  finish  their 
j computations.  This  limits  the  possible  speed-up  to  2.6  rather  than  6. 

We  also  note  that  the  use  of  critical  sections,  too,  should  be  avoided,  since,  with 

i 

i the  AJ  or  AGS  methods,  when  6 processes  are  used,  about  38  percent  of  the  time  is  spent 
j waiting  for  entering  the  critical  section,  again  limiting  the  possible  speed-up  to  3.7 
I rather  than  6. 


The  measurements  for  the  PA  method,  on  the  other  hand,  indicate  that  we  achieve 

* 

i an  almost  full  speed-up  with  this  method  (at  least  with  a small  number  of  processes).  An 

i 

j obvious  reason  for  this  speed-up  is  the  total  absence  of  any  form  of  synchronization) 

■1  ! ! 

another  reason,  specific  to  the  problem  we  have  experimented  with  and  indicated  by  the 
results  of  table  1,  is  the  sparsity  of  the  matrix  A 

The  bounds  derived  in  section  6 have  been  obtained  in  a very  general  case.  Yet 
tables  1 and  2 show  that  they  are  always  within  a factor  between  3 and  6 of  the  actual 
measurements  (except  for  Jacobi’s  method  where  they  are  sharp).  In  addition,  we 
i certainly  could  obtain  much  sharper  bounds  by  carrying  out  the  analysis  for  the  specific 
problem  we  have  experimented  with  (for  example,  by  taking  into  account  the  sparsity  of 
the  matrix).  In  particular,  a specific  analysis  for  the  PA  method  can  easily  explain  the 
fact  that  l/Ee  is  almost  not  influenced  by  the  number  of  processes  (see  table  1). 


8 - Asynchronous  iterations  with  super-linear  convergence 

As  we  already  noticed,  the  bounds  established  in  section  6 are  certainly  not 
adequate  to  measure  the  complexity  of  iterations  with  super -linear  convergence.  In  this 
section,  we  use  as  an  example  the  iterative  method  we  have  mentioned  at  the  beginning 
of  section  5 to  show  how  an  analysis  of  the  complexity  can  be  done  for  this  case. 


To  study  the  convergence  of  a sequence  x(j),  j - 0,  l, ...,  toward  its  limit  £,  we  now 
use  the  following  usual  measures  of  complexity.  The  order  of  convergence  is  defined  as 
p - lim  inf  . 

and,  as  before,  if  cj  is  the  cost  associated  with  the  evaluations  of  the  first  j iterates, 
x(l),  ....  x(j),  we  define  the  efficiency  of  the  sequence  by: 

£ - lim  inf  y>00  tdog-log||*0‘MIIVc;] , 

Again,  we  note  that,  if  the  average  cost  per  step  c j/j  tends  to  some  finite  limit  e when  j 
tends  to  infinity,  the  efficiency  is  simply  given  by  £ - (i.ogp)/t.  In  the  remainder  of  the 
section,  we  assume  that  the  limit  c exists. 

In  order  to  find  the  simple  root  £ of  an  operator  G from  fF?n  into  itself,  we  use  the 
Asynchronous  Newton's  method,  AN,  as  implemented  by  the  two  processes  described  at  the 
beginning  of  section  5.  Let  r v i - 1,2, ...,  be  the  number  of  iterates  evaluated  by  the 
first  process,  PI,  during  the  i-th  evaluation  of  the  derivative  C*  by  the  second  process, 
P2.  Let  jq  - 0 and  Ji  • rl  ♦ ...  ♦ for  i - 1,2, ...,  then  x(jj),  t - 0,  J,  -.,  is  the  iterate  used 
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by  P2  for  the  ft*i)-st  evaluation  of  the  derivative.  Starting  with  the  two  initial  values 
»(0)  and  G'(x(0)),  the  AN  method  generates  with  the  two  processes  Pi  end  P2  the 
sequence  of  iterates  x(j),  j - 1,  2, ....  defined  by 

x(j*l)  - x(j)  - IGMJtri^GWjn  .<or  i - 1,  2, ...  and  4 <j*  yi+i  • (8.1) 

The  following  theorem  gives  the  measures  of  complexity  for  this  sequence  it  we 
' Know  some  bounds  on  the  sequence  r ^ i * if  2,  MM 


! Theorem  4: 

Let  the  initial  approximation  x(0 ) be  close  enough  to  the  root  f,  that  Is 
x(0)  € De  - { * | !|*-f  ||  < t } , 

and  let  the  derivative  G’  satisfy  some  Lipchitz  condition  on  0{: 

\\G'(x)-G'<y)\\  i M||*-y||  , V *,  y € De  . 

If  e satisfies  the  condition 

m&<irlU  < 2/s  , 


and  if  there  exist  some  positive  integers  p and  q such  that 
p i q , for  i • 1,2, ... , 

then  the  order  of  convergence,  p,  and  the  efficiency,  E,  of  the  sequence  defined  by 
equation  (8.1)  satisfy: 

(8.2) 


and 


E z (log Xp)/(qc)  , 


(8.3) 


where  \p  is  the  largest  root  of  the  equation  z3  - z2  - (p-l)z  - i - 0 (for  which  we 

can  check  easily  that  0.4  * Vp  < < 0.5  ♦ >/p,  p * /,  2,  ».). 

! P 
i Proof: 


The  proof  is  easy  but  technical,  and  below  we  only  give  an  outline  for  this  proof. 

Let  oc  - M||G>(rr/||,  and  let  e - 3ot/[2(l-oct) J.  From  the  choice  of  a,  we  first  note 
that,  starting  with  x(0)  C Dt,  the  sequence  j * 0,  l, ...,  is  strictly  decreasing  and 

satisfies: 


\\*(Jfi)l\\  s , tor  i*2,3,  m.  , 

and 

||*0'W>fll  * c||*Oi-iHIIH*0'HU  . for  i - 2,  3,  «.  and  >,  < ) < /*♦  | - 4 ♦ **i  • 
By  substitution,  it  follows  that,  for  i ■ 2,  3, 

WzQmHW  i cri||*0t-i>firr,|l*0i-2>-fllll*0i>-ril . 

and,  if  we  set  • -logcH*0^f  II.  we  obtain: 
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“*♦/  » «i  ♦ (rrI)ui-J  * ui-2  • for  * * 2>  3>  - • 

Therefore,  by  using  the  lower  bound  on  r j,  we  deduce  that 

ui+t  * u*  4 (p-Du^i  4 , for  t » 2, 3, ... . 

This  shows  that  tends  to  Infinity  at  least  as  fast  as  Xp*.  Therefore,  the  order  of 

convergence,  p\  of  the  subsequence  x(jj),  i • 0, 1, ...,  must  verify  p*  a Xp.  The  bounds 
(8.2)  and  (8.3)  are  derived  directly  from  this  last  inequality.  | 

In  particular,  if  the  cost  cj  measures  the  number  of  evaluations  of  the  operator  G, 
we  simply  have  cj  • j,  and,  therefore,  Ea  * (log\p)/q.  On  the  other  hand,  if  the  cost 
corresponds  to  the  execution  time,  the  efficiency  will  depend  on  the  implementation 
itself.  For  example,  an  implementation  corresponding  strictly  to  the  generation  of  the 
sequence  described  by  equation  (8.1)  requires  the  use  of  a critical  section  for  reading 
and  writing,  in  a block,  the  values  of  the  iterates  and  of  the  derivative.  The  use  of  a 
critical  section  introduces  an  overhead,  but,  as  is  done  with  the  PA  method,  the  overhead 
can  be  avoided  if  a process  uses  whatever  values  are  currently  available  when  needed. 
In  this  case  the  bounds  of  theorem  4 still  holds,  and  c can  be  given  the  value  e - 1. 

The  parameters  p and  q,  too,  depend  on  the  particular  implementation  of  the  AN 
method,  and,  especially,  on  the  relative  speeds  of  the  processors  executing  the  processes 
PI  and  P2.  In  practice,  if  the  processors  are  equally  as  fast,  we  expect,  with  small 
variations,  to  be  close  to  n,  and  the  values  p - q - n can  predict  good  estimates  for  the 
^ efficiency  of  the  AN  method  implemented  with  two  processes. 

The  AN  method  is  easily  generalizable  to  more  than  two  processes.  If  k processes 
i are  available,  kj  might  be  assigned  to  the  evaluation  of  the  sequence  of  iterates,  while 
; ft2  * k - fcj  are  assigned  to  the  evaluation  of  the  derivative.  The  bounds  of  theorem  4 
| still  holds  for  this  case  as  well,  only  with  different  values  for  the  sequence  r-v  i - 1,2, ... 
(or  for  the  bounds  p and  q),  determined  by  the  parallel  implementations  of  the  two 
evaluations.  Further  results  in  this  direction  will  be  reported  elsewhere. 

I 

9 - Extensions  of  the  results 

We  mention  below  some  direct  extensions  of  the  results  presented  in  this  paper 
and  some  points  subject  to  further  developments. 

A straighforward  generalization  of  the  results  can  be  obtained  if,  Instead  of  f?n,  we 

i 

i 
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consider  the  product  P of  n Banach  spaces  B ± with  norms  |.|^,  i - 1, ...  a.  In  this  esse,  if  x 
! Is  an  element  of  P,  x Is  determined  by  Its  components  i • i,~  n.  And  |s| 

represents  the  non-negative  vector  of  f?n  with  components  \xfo  i • 1,  ...  n. 

Considering  only  the  class  of  linear  operators,  F(x)  - Ax  ♦ 6,  we  have  noted  that 
the  notion  of  contracting  operators  coincides  with  the  condition  that  p(\A\)  < /.  In  [1], 
Chazan  and  Miranker  have  shown  that  this  condition  is  not  only  sufficient  but  also 
necessary  for  the  convergence  of  all  chaotic  iterations.  This  implies.  In  particular,  that 
all  asynchronous  iterations  corresponding  to  a linear  operator  F are  convergent  if  and 
only  if  F is  a contracting  operator.  When  we  also  consider  non-linear  operators, 
however,  the  proof  given  by  Chazan  and  Miranker  does  not  apply  any  more,  and  it  would 
be  of  interest  to  obtain  conditions  on  the  class  of  operators  for  which  all  asynchronous 
iterations  are  guaranteed  to  converge.  Similar  conditions  for  the  convergence  of  a more 
restricted  class  of  iterations  would  also  be  of  interest,  in  particular,  for  the  subclass  of 
asynchronous  iterative  methods  corresponding  to  the  additional  assumptions  introduced 
in  section  6.1. 

The  bounds  we  have  obtained  to  estimate  the  rate  of  convergence  of  asynchronous 
iterations  have  been  derived  by  considering  the  worst  possible  case,  and,  compared  to 
actual  measurements,  these  bounds  happen  to  be  very  conservative.  It  would  certainly 
be  very  useful  to  obtain  bounds  (or  estimates)  corresponding  to  the  average  behavior  of 
asynchronous  iterations,  for  example,  given  the  probability  distributions  of  the  two 
sequences  9 and  A or,  more  generally,  given  the  distribution  functions  for  the  time  it 
takes  the  different  processes  to  evaluate  the  components. 

We  have  already  mentioned  the  possibity  to  introduce  a relaxation  factor  in 
asynchronous  iterations,  and,  for  contracting  operators,  we  have  derived  a possible  range 
that  guarantees  the  convergence  of  all  asynchronous  iterations.  Nothing  is  known, 
however,  about  the  optimal  choice  of  the  relaxation  factor,  for  example,  given  directly 
the  asynchronous  iteration  through  9 and  A or,  again,  given  the  distribution  functions  for 
the  evaluation  times. 

10  - Concluding  remarks 

In  the  implementation  of  most  parallel  algorithms,  synchronization  seems  to  be 
required  to  assure  the  communication  between  the  processes,  and  to  guarantee  their 
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I correct  executions.  However,  the  main  drawback  with  synchronization  is  that  it  degrades 
considerably  the  performance  of  the  algorithms  because  it  is  very  time  consuming.  The 
. class  of  asynchronous  iterative  methods  avoids  this  drawback.  It  includes  iterations 
corresponding  to  a parallel  implementation  in  which  the  cooperating  processes  have  a 
1 minimum  of  intercommunication  and  do  not  make  any  use  of  synchronization.  The  Purely 


Asynchronous  method  described  in  section  7.1  is  a typical  example  of  an  asynchronous 
iterative  method. 

In  [I],  Chazan  and  Mir  anker  introduced  chaotic  relaxation  schemes  requiring  a 
! condition  which  can  only  be  satisfied  by  using  repeated  checking  and  some  form  of 
synchronization  at  each  step  of  the  iteration.  Asynchronous  iterative  methods  do  not 
require  thi^  condition  and  are  more  general  than  chaotic  relaxation  schemes. 

| i 

Asynchronous  iterations  further  generalize  to  asynchronous  iterations  u/ith  memory  which 
allow  different  values  of  the  same  variable  to  be  used  within  the  same  computation. 

f 

i 

Using  the  notions  of  contracting  operators  and  of  m-contraeting  operators, 
j theorem  1 and  theorem  2 state  sufficient  conditions  to  guarantee  the  convergence  of  any 
j asynchronous  iterations  and  asynchronous  iterations  with  memory.  These  conditions  are 
! satisfied  for  a large  class  of  operators. 

i 

In  the  second  part  of  the  paper,  asynchronous  iterations  are  evaluated  from  a 
computational  point  of  view,  then  the  results  of  a series  of  actual  measurements 
(obtained  by  running  asynchronous  iterations  on  a multiprocessor)  are  presented.  These 
results  fully  justify  the  use  of  asynchronous  iterative  methods. 

General  bounds  on  the  efficiency  of  asynchronous  iterations  are  first  derived 
directly  from  the  proof  of  the  convergence  theorem.  Although  these  bounds  are  sharp  for 
a parallel  implementation  of  Jacobi’s  method,  they  are  of  little  applicability  since  they 
require  to  know  a priori  the  exact  specification  of  each  step  of  the  iteration.  Alternate 
bounds  are  then  derived  under  additional  conditions  which  are  usually  satisfied  in 
practical  applications.  These  bounds  are  consistent  with  actual  measurements}  for  the 
experiments  we  have  run,  they  are  always  within  a factor  of  6 of  the  measurements.  In 
addition,  it  is  our  feeling  that  these  bounds  can  be  largely  improved  if  we  take  Into 
account  specific  characteristics  of  the  problem  being  solved,  therefore  leading  to  a better 
understanding  of  asynchronous  iterations.  In  section  8,  for  example,  we  have  made  a first 
step  in  this  direction,  and  we  have  presented  an  analysis  for  the  Asynchronous  Newton's 


method. 
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A series  of  experiments  has  been  conducted  on  C.mmp,  a multiprocessor  system 
(with  6 processors  at  the  time  the  experiments  have  been  run),  and  several  asynchronous 
iterative  methods  have  been  implemented  to  solve  a large  linear  system  of  equations. 
They  range  from  Jacobi’s  method,  requiring  a full  synchronization  of  all  the  processes  at 
each  step  of  the  iteration,  to  the  PA  method,  which  requires  no  synchronization  at  all.  In 
between,  the  AJ  and  AGS  methods  are  derived  from  the  usual  Jacobi’s  and  Gauss -Seidel’s 
methods,  and  they  require  the  use  of  a critical  section. 

The  experimental  results  show  a considerable  advantage  for  the  iterative  method 
with  no  synchronization  at  all.  For  a number  of  processes  up  to  the  number  of 
processors  available  on  C.mmp,  the  PA  method  exhibits  full  parallelism  and  has  an 
optimal  speed-up  compared  to  Gauss -Seidel’s  method,  the  best  sequential  method 
experimented  with.  The  AJ  and  AGS  methods  have  a very  similar  behavior,  and  when  6 
j processes  are  used  the  overhead  caused  by  the  critical  section  implies  that  38  percent  of 
the  time  a process  is  waiting  for  entering  the  critical  section.  As  is  intuitively  expected, 

t 

Jacobi’s  method  has  the  worst  behavior  of  all  the  methods  considered,  and,  with  6 
! processes,  the  overhead,  due  to  the  synchronization  of  all  the  processes  at  each  step  of 
| the  iteration,  is  about  57  percent  (i.  e.,  more  than  half  the  time  a process  is  waiting  for 
the  other  processes  to  finish  their  computations). 

On  the  basis  of  these  experimental  results,  and  for  the  problem  we  have 
considered,  there  does  not  seem  to  be  any  alternatives:  the  PA  method  is  obviously  the 
most  efficient  one.  In  addition,  another  advantage  of  the  PA  method  is  that  it  is  the 
easiest  one  to  implement,  and,  spacewise,  it  is  also  the  most  efficient  one. 


Finally,  another  possibility,  which  has  only  been  outlined  in  the  paper,  is  the 
' introduction  of  a relaxation  factor.  Based  only  on  a few  experimental  results  (not 
reported  here),  it  is  our  belief  that  we  can  expect  an  improvement  of  the  Purely 


Asynchronous  Over-Relaxation  method  over  the  PA  method  similar  to  the  improvement  of 

! ; 

’ the  SOR  method  over  the  Gauss -Seidel’s  method,  if  we  choose  the  relaxation  factor  in  an 
optimal  way.  The  optimal  choice  of  the  relaxation  factor  depends  not  only  on  the  system 
; being  solved,  but  also  on  the  probability  distributions  of  the  various  execution  times  by 
' the  different  processes. 
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